arXiv: 1507.01542v2 [stat.ME] 18Jul2016 


Treatment Effects on Ordinal Outcomes: Causal 


Estimands and Sharp Bounds 


Jiannan Lu, Peng Ding* and Tirthankar Dasgupta^ 


Abstract 

Under the potential outcomes framework, causal effects are defined as comparisons between 
the treatment and control potential outcomes. Unfortunately, however, the average causal effect, 
often the parameter of interest, is generally not well defined for ordinal outcomes. To address 
this problem, we propose to use two causal parameters that are defined as the probabilities that 
the treatment is beneficial and strictly beneficial for the experimental units. These two causal 
parameters are well defined for any outcomes and of particular interest for ordinal outcomes. 

These parameters, though of scientific importance and interest, depend on the association be¬ 
tween the potential outcomes and are therefore, without further assumptions, not identifiable 
from the observed data. In this paper, for ordinal outcomes we derive the sharp bounds of the 
two causal parameters using only the marginal distributions, without imposing any assumptions 
on the joint distribution of the potential outcomes. Because we define the causal effects and 
derive the bounds based on the potential outcomes themselves, the theoretical results can be 
incorporated into any models of the potential outcomes, and are applicable to randomized exper¬ 
iments, unconfounded observational studies, and randomized experiments with noncompliance. 

We illustrate our methodology via numerical examples and real-life applications. 
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1. Introduction 


The potential outcomes framework (Nevman 


1923; 


Rubin 


.1974) permits defining causal effects as 
comparisons between the potential outcomes under treatment and control. The a verag e causal 


effect, generally the parameter of interest ever since the seminal work of 


Nevmanl 1923j), is not 


applicable to ordinal outcomes, because average outcomes themselves are not well defined. Ordinal 


outco mes are common in applied research, and the generalized linear model literature (cf. 


Agrest: 


20101 ) has discussed ordinal outcomes extensively. Unfortunately, however, although the model 


parameters of the generalized linear models are usef ul sum maries of the data, they are often not 


direct measures of the causal effects of interest (Freedman 


20080 . Moreover, statistical inference 


often requires correctly-specified models, and when the generalized linear model assumptions are 
violated, the interpretations of the parameters often become obscure. The causal inference liter¬ 
ature mainly focus es on the av e rage causal effect, and does not give a thorough investigation of 


ordinal outcomes. 


Rosenbauml (|200ll ) discussed causal inference for o rdinal outcom es u nder the 


monotonicity assumption that the treatment is beneficial for all units. 


Cheng 

(2009 

) and 

Agresti 


(j20 ld) di scussed vari ou s caus al parameters under the assumption of independent potential out¬ 


comes. 


Volfovskv et ah 


(120151 ) exploited a Bayesian strategy. 


the joint values of the potential outcomes. 


Diaz et al 


requiring a full parametric model on 


(20161) proposed to use a causal parameter 


that did not rely on the assumption of the proportional odds model for ordinal outcomes. 

For ordinal outcomes, we propose to use two causal parameters measuring the probabilities 
that the treatment is beneficial and strictly beneficial for the experimental units, which play im¬ 
portant roles in decision and policy making for randomized evaluations with ordinal outcomes. 
Because these two causal parameters depend on the association between the treatment and control 
potential outcomes, they are generally not identifiable from the observed data. Without imposing 
any assumptions about the underlying distributions of, or the association between, the potential 
outcomes, we sharply bound them by using the marginal distributions of the potential outcomes. 
Mathematically, deriving the sharp bounds of the proposed ca usal paramet ers is closely related to 


a classical probability problem posed by A. N. Kolmogorov (c.f. 


Nelsen 


2006), which is a non-trivial 


task for ordinal outcomes. We believe this is a major contribution to the literature. 

Because these bounds hold for causal parameters defined by the potential outcomes themselves, 
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they hold without any modeling assumptions, and therefore can be incorporated flexibly into any 
chosen models of the potential outcomes in practice. Furthermore, they are directly applicable to 
randomized experiments, unconfounded observational studies, and randomized experiments with 
noncompliance. In randomized experiments, we can identify the bounds immediately, and addi¬ 
tionally, sharpen the bounds by exploiting covariate information under certain modeling assump¬ 
tions. In observational studies, if the treatment assignment is unconfoun ded given t h e observed 


covariates, we can identify the bounds by propensity score weighting (IRosenbaum and Rubin 


Hirano et al. 


1983 


2003). Furthermore, we extend the theory to accommodate noncompliance, because 


it often arises in practical randomized evaluations. 

The paper proceeds as follows. Section [2] introduces the potential outcomes framework for causal 
inference for ordinal outcomes, and proposes two causal parameters that are natural measures of 
causal effects and are of practical importance. Section [3] derives the sharp bounds of the proposed 
causal parameters. Section [4] generalizes the bounds to noncompliance. Section [5] discusses statis¬ 
tical inference of the bounds. Sections El and [7] present numerical and real examples to illustrate 
the theoretical results. We conclude in Section [HI prove the main theorem in the Appendix, and 
relegate other technical details to the Supplementary Material. 

2. Causal Inference for Ordinal Outcomes 
2.1. Potential Outcomes 

We consider a study with N units, a binary treatment, and an ordinal outcome with J categories 
labeled as 0,... , J — 1, where 0 and J — 1 represen t the worst and best categories. Under the Stable 


Unit Treatment Value Assumption (Rubin 


1980) that there is only one version of the treatment 


and no interference among the units, we define the pair { 17 ( 1 ), 17 ( 0 )} as the potential outcomes of 
the zth unit under treatment and control, respectively. Let 

Pki = pr (17(1) = k, 17(0) = 1} (k, l = 0,..., J — 1) 


denote the proportion or probability of units whose potential outcome is k under treatment and l 
under control. The probability notation “pr(-)” is either for a finite population of N units or for a 
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super population, depending on the question of interest. The probability matrix P = ( Pkl)o<k,l<J-i 
summarizes the joint distribution of the potential outcomes. We denote the row and column sums 
of P by 

J -l J -l 

Pk + = Y,Pkv, P + z = X>‘ (M = 0, l,..., J — l). 

l '=0 k '=o 

The vectors p\ = (po+, ■ ■ ■ ,PJ- i,+) T and po = {p + o,... ,p +i j_i) T characterize the marginal distri¬ 
butions of the potential outcomes under treatment and control, respectively. 


2.2. Causal Parameters for Ordinal Outcomes 


We discuss the existing causal parameters for ordinal outcomes, and the motivation behind propos¬ 
ing new ones. Any causal parameter is a function of the probability matrix P. Unfortunately, the 


average causal eff ect is not wel. 
causal effects (cf. Ju and Geng 


defined for ordinal outcomes. Instead, we can use the distributional 


2010 ) 


Aj = pr-JYi(l) > j} - prjy^O) > j} = ^Pk+ ~ ^P+l (j = 0,..., J - 1) (1) 

k >j l >j 


to measure the difference between the marginal distributions of potential outcomes at different 
levels of j. However, unless the distributional causal effects Aj’s have the same sign for all j, it 
is difficult to decide whether the treatment or the control is preferable. We may use 
to measure the treatment effect, but such a measure depends crucially on the weights Wj’s. We 
illustrate this point by using the following numerical example. 

Example 1. Let pi = (1/5, 3/5,1/5) T and po = (2/5,1/5,2/5) T , with Ao = 0, Ai = 1/5 and 
A 2 = —1/5. The treatment is beneficial at level 1, but not at level 2. In this case, distributional 
causal effects do not provide straightforward guidance for decision making. 


When Aj > 0 for all j, Y( 1) stochastically dominates T(0). When thi s patter n appears in real 


data applications, practitioners often fit a proportional odds model (Agresti 


2010) and summarize 


the overall effectiveness of the treatment by a single odds ratio parameter. Although such summary 
parameter may be useful in certain cases, its causal interpretation is unclear. Moreover, when the 
data does not present the stochastic dominance pattern as in Example[H summarizing the treatment 
effect by the single odds ratio parameter of a wrong model often gives misleading conclusions. 
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Volfovsky et a l. 


(120151) studied the conditional medians 


rrij = med{Yi(l) \Yi(0) = j} (j = 0,..., J - 1), (2) 

which is a set containing all values of k such that ^fc'=o Pk'j — P+j/% and Yhk~=kPk '3 > P+j/ 2- By 
definition, the conditional medians may not be unique, and they are only well defined for j with 
p+j > 0. Moreover, they are not direct measures of the treatment effect itself. 

We propose to use two causal parameters that measure the probabilities that the treatment is 
beneficial and strictly beneficial for the experimental units: 


T = 


pr-Oi(l) > *i(0)} = 71 = P 1 ^ 1 ) > = ^^Pkl- 


( 3 ) 


k>l 


k>l 


The causal parameters t and q are measures of causal effects that are well defined for any types 
of outcome s, and of particular interest to ordinal out co mes. Similar causal measures appeared in 


biomedical (Gadburv and Iver 


20161 ') and soci al sciences (Heckman et al. 


200C; 


Newcombe 


2006al 


1997 


b 


Zhou 


20081 : 


Hu ang et al 


Fan et al. 


Djebbari and Smith 


200 


2015i 


Demidenko 


Fan and Park 


201 J : 


201411. In practice, we suggest using the pair (r, q) as measures of causal effects on 


ordinal outcomes. For example, if the sharp null holds, i.e., 1}( 1) = 1/(0) for all units i, then 
r = 1 and ij = 0. In this case, using only r may be misleading. Nevertheless, we argue that the 
parameter r is as important as q. Because 1 — t = pr {F;(0) > Fj(l)}, the value of r determines the 
probability that the control is strictly beneficial for the experimental units. Due to the symmetry 
of treatment and control labels, r and q are equally useful for real data analysis. 

We use the following numerical example to show the values of rrij, t and q. 

Example 2. Consider the following probability matrix: 


P = 


( 0 1/6 1/6 ^ 
0 1/6 0 
V 0 1/3 1/6 J 


In this case, mo is not well defined, mi is 1, and m 2 = {0,1,2}. However, we have r = 2/3 and 
q = 1/3, i.e., two thirds of the population benefit from the treatment and one third strictly benefit. 
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The causal parameters r and 77 in (J3J) are well defined for both finite populations and super 
populations. They are functions of the potential outcomes, which distinguishes them from the 
parameters in super population models. When the models are mis-specified, the interpretations of 
the corresponding model parameters are often obscure. We have already discussed this issue for 
the proportional odds model. Our causal parameters r and 77 are closely related to the relative 
treatment effect a = pr{Yj(l) > Y*(0)} — pr jYfl) < Y ,(0)} previously studied under the assump¬ 


tion of independent potential outcomes (A grest i 


2010). This relative treatment effect a and the 


causal parameters we proposed have a simple algebraic relationship, i.e., a = r + 77 — 1. Therefore, 
our newly proposed causal parameters r and 77 determine a. Furt herm ore, these causal parameters 


are also related to the notation of “probability of causation” (IPearl 


2009), because their direct 


interpretations are the probabilities or proportions that the treatment affects the outcome on the 
individual level. It is for these reasons that we advocate using t and 77 as causal effect measures 
for ordinal outcomes. 


3. Sharp Bounds on the Proposed Causal Estimands for Ordinal Outcomes 
3.1. Closed-Form Expressions of Sharp Bounds 

The definitions of t and 77 involve the association between the treatment and control potential 
outcomes. Because we can never jointly measure the potential outcomes, the observed data do 
not provide full information about their association, rendering the causal parameters r and 77 not 
identifiable. To partially circumvent this difficulty, we focus on the sharp bounds of r and 77 , which 
are the minimal and maximal values of r and r) under the constraints of the following marginal 
distributions: 


J-i J -1 

Y.pw =Pk+, £>, = *+!, Pki> 0 (k,i = 0 ,..., j - 1 ). 

l '=0 k '=0 


( 4 ) 


The sharp bounds depend only on the marginal distributions of the potential outcomes. Deriving 
the sharp bounds is equivalent to solving linear programming problems, be cause th e objective 


functions in (J3J) and the constraints in Q are all linear. Previous literature (jHuang et al. 


2015) 


used a numerical method to solve the linear programming problem for 77 . Fortunately, we can derive 
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closed-form solutions of the above linear programming problems for both r and rj. 

In this paper, we not only derive the sharp bounds of the causal parameters of interest, but also 
construct explicitly the probability matrices that attain these bounds. First we state a theorem on 
the sharp bounds of r, which is the foundation for the remaining theorems and corollaries. 

Theorem 1. The sharp lower and upper bound of r are 


tt = max (b + , + A,-), tu = 1 + min A,-. 
0<7<J-1 o<j<J-i J 


The bounds in Theorem Q] are closely related to the distributional causal effects in (jT]) , and 
we can interpret them as the conservative and optimistic estimates of the probability that the 
treatment is beneficial to the outcome. Furthermore, the following corollary demonstrates that the 
sharp upper bound tjj is related to the stochastic dominance assumption, i.e., A j > 0 for all j. 

Corollary 1 . The causal parameter tjj = 1, if and only if the marginal probabilities p\ and po 
satisfy the stochastic dominance assumption. 


The above corollary implies that for any marginal probabilities satisfying the stochastic dom¬ 
inance assumption, there exists a lower triangular probability matrix P that c orresp ond s to a 


population satis 


and 


'ying the nronotonicity assumption, i.e., Y)(l) > 0 ) for all i. 


Rosenbaum 


Strassenl ( 1965H 


(120011) demonstrated this result, and Theorem [T] extends the previous result 


without imposing the stochastic dominance assumption. Moreover, Theorem [T| also justifies the 
A 3 as a measure of the deviation from the stochastic dominance assumption 


use of mirin<,'<j_ 

Aj_e 

(Scharfstein et al. 

2004) 


Next we consider bounding rj. Realizing that rj = 1 — pr {Yj(0) > T)( 1)} , we can derive bounds 
for pr{Yj(0) > 1)} by switching the treatment and control labels and applying Theorem [1] 


Theorem 2. The sharp lower and upper bounds of rj are 


r] L = max A,-, rju = 1 + min (A,- - Pj+) 


( 5 ) 


Deriving the sharp bo unds o f r and rj is related to a classical probability problem, first posed 


by A. N. Kolmogorov (c.f. 


Nelsen 


2006): how to bound the distribution of the sum (or difference) 
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of two random variables with fixed marginal distributions? When S = Y(l) — Y(0) is well-defined 
as for continuous outcomes, our causal parameters r and rj are determined by the distribution of 


the causal effect 5, the difference between the treatment and contr o 


bounds on the distribution of 5 have been o 


Frank et al. 


jtained by 


([19871 ), and recently reviewed by 


Makarov 


Fan and Park 


poten tial ou tco mes . The 


(11982), 


2010 ) and 


Riischendorf 


Fan et al 


sharp 


(119821 ) and 


(2014|). However, 


their results and proofs apply to the cases when Y( 1) — Y(0) is well defined, and the bounds in 
Theorems 1 and 2 hold in general including both continuous and ordinal outcomes. 

In the proofs of Theorem [1] and [2] we construct the probability matrices that achieve the lower 
and upper bounds of t and r/, which correspond to negatively associated and positively associated 
potential outcomes. They are both extreme scenarios. I n practice, re s earchers may also be inter 


ested in the case with in dependent potential outcomes ([Rubin 


Ding and Dasgupta 


1923 ; 


Cheng 


2009; 


Agresti 


201C 


20161) . i.e., pki = Pk+P+i for all k and l. With independent potential outcomes, 


we can identify t and p from the marginal distributions of the potential outcomes. 


Theorem 3. With independent potential outcomes, 


ti = EE Pk+P+h Vi = EE Pk+P+i■ 

k>l k>l 

Furthermore, tl < tj < tij and tjl < rjj < r)jj. 

In cases where negatively associated potential outcomes are unlikely, we can use tj and rji as 
the lower bounds of r and rj. Below we give two numerical examples to illustrate Theorems Q] El 

Example 3. The marginal probabilities p\ = (1/5, 3/5,1/5) T and po = (2/5,1/5, 2/5) T do not 
satisfy the stochastic dominance assumption, because Ao = 0, Ai = 1/5 > 0 and A 2 = —1/5 < 0. 
Theorems [I] and [3] imply that tl = 2/5, tj = 16/25, and tjj = 4/5. The probability matrices 
corresponding to negatively associated, independent, and positively associated potential outcomes 
achieving these values are respectively 



1 0 

1/5 

0 N 


( 2/25 

1/25 

2/25 ^ 


1 1/5 

0 

0 N 

Pl = 

1/5 

0 

2/5 

, P2 = 

6/25 

3/25 

6/25 

, Pi = 

1/5 

1/5 

1/5 


V 2/5 

0 

0 ) 


^ 2/25 

1/25 

2/25 f 


l 0 

0 

1/5 , 


8 


















































Similarly, Theorems [2] and [3] imply ijl = 1/5, rjj = 9/25, and rju = 3/5. 

Example 4. The marginal probabilities p\ = (1/5,1/5,3/5) T and po = (3/5,1/5,1/5) T satisfy 
the stochastic dominance assumption, because Ao = 0, Ai = 2/5 > 0 and A 2 = 2/5 > 0. Theorems 
Q] and [3] imply tl = 3/5, tj = 22/25, and tjj = 1. The probability matrices corresponding to 
negatively associated, independent, and positively associated potential outcomes achieving these 
values are respectively 



( 0 

1/5 

0 N 


( 3/25 

1/25 

1/25 ^ 


1 1/5 

0 

0 N 


Pi = 

0 

0 

1/5 

, P 5 = 

3/25 

1/25 

1/25 

, Pfi = 

0 

1/5 

0 

• (7) 


v 3/5 

0 

0 ) 


^ 9/25 

3/25 

3/25 j 


V 2/5 

0 

1/5 f 



Similarly, Theorems [2] and [3] imply tjl = 2/5, rji = 3/5, and rju = 4/5. 

As demonstrated in Examples [3] and [fj the bounds of r (or 77 ) generally do not shrink to a 
point. However, there are some special cases in which the lower and upper bounds of t (or r/) are 
identical. The sufficient and necessary conditions appear to be technical, and therefore we relegate 
the discussion to the Supplementary Material. 


3.2. Covariate Adjustment 


Grilli and Mealli 

2008; 

Lee 

1 1 

2009: 

Long_and Hudgens 

2013 

Mealli and Pacini 

2013) 


2013). Without loss 


of generality, we focus only on the bounds of r. Within each level of the pretreatment covariates 
X = x, 

t(x) = pr{T (1) > y(0) \ x = x} 


is the conditional probability that the treatment is beneficial. We can obtain the conditional lower 
and upper bounds tl(x) and iy{x) given the covariate level x, then average them over the covariate 
distribution F (x), and finally obtain the adjusted bounds for r : 


/ 

t l = 


t l (x) F (dx ), r{j = 


tu ( x ) F (dx). 


Theorem 4. The adjusted bounds are tighter, i.e., tl < r' L < t{j < tjj. 


( 8 ) 
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Theorem [4] holds intuitively, because the existence of covariates imposes more distributional 
restrictions on the observed data. We use the following example to illustrate Theorem 21 


Example 5. Consider a population consisting of two sub-populations of equal sizes, labeled by a 
binary covariate X. Assume that the potential outcomes of sub-populations X = 1 and X = 0 are 
the independent potential outcomes in Example [3] and 21 Simple algebra gives the following joint 
distribution, marginal distributions, and r of the potential outcomes: 


1 1/10 1/25 3/50 ^ 
9/50 2/25 7/50 , 


pi = (l/5,2/5,2/5) T , po = (1/2,1/5, 3/10) T , r = 19/25. 


^11/50 2/25 1/10 


Without covariate information, Theorem Q] implies tl = 1/2 and tjj = 1. However, if we first obtain 
the bounds for the two sub-populations and then average over them, we obtain sharper covariate 
adjusted bounds t' l = tl( l)/2 + tx( 0)/2 = 1/2, and = tjj( l)/2 + tu(0)/2 = 9/10. 


3.3. Identifying the Bounds from Observed Data 


Previous subsections discussed the causal parameters t and r/ and their bounds. The causal 
parameters depend on the joint distribution of the potential outcomes, but the bounds depend 
only on the marginal distributions of the potential outcomes. In practice, the observed data pro¬ 
vide full information about only the marginal distributions. Therefore, point estimations of the 


bou nds can be obtained, although the causal parameters t 


(c.f. 


Romano and Shaikh 


2008, 


2010 ; 


Richardson et al 


remselves are only partially identified 


2014). 


For unit i = 1,... ,N, let the treatment indicator be Zi, and the observed outcome be l/ obs = 
ZjYi{l) + {l — Zi)Yi{Q). To avoid con ceptual co mp lications, we con sider treatment assignments that 


satisfy the ignorability assumption i Rosenbaum and Rubin 


19831 1. i.e., ZJL{y(l),y(0)} | W. The 


ignorability assumption holds by the design of randomized experiments, and cannot be validated in 
observational studies. Under the ignorability assumption, we define the propensity score as e(X) = 
pr (Z = 1 | X), which is a constant independent of X in completely randomized experiments. We 
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can identify the marginal distributions of the potential outcomes by 


pr{Y(l) =k} = E 


Zl{Y obs = k ) 


pr{Y(0) =l} = E 


(1 - Z)l(Y obs = /) 
l-e(X) 


By replacing the expectations by their sample analogues, we obtain the moment estimators for the 
marginal distributions. We defer more detailed discussion about statistical inference to Section [5j 

4. Randomized Experiments with Noncompliance 
4.1. Causal Effects for Compliers 

Noncompliance is an important topic in practice. For instance, in clinical trials some patients may 


not comply with their assigned treatments. Altho ugh noncomp’ 


investigated in the causal inference literature (e.g., 


Ang rist et al 


iance itself has been extensively 


1996|), there appears to be very 


limited discussions about causa l inference of ordinal outcomes in the presence of noncompliance. To 
the best of our knowledge, 


Cheng 


of one-sided noncompliance, and 


( 200911 discu ssed various causal parameters under the assumptions 


Baker (2011) generalized her results to two-sided noncompliance; 


both of them assumed independent potential outcomes. 

Under the Stable Unit Treatment Value Assumption, for unit i, let (-Dj(l),ZU(0)} be the po¬ 
tential values of treatment received under treat ment and control; t he observed treatment received 


is therefore _D° bs = ZiDi(l) + (1 — Zi)Di(0 ). 


Angrist et al 


19961 ) proposed to classify the units 


into four categories according to the joint values of Di(l) and Di( 0) : 

a, if A(l) = 1, A(0) = 1, 

c, if A(i) = i,A(0) = 0, 

d, if A(l) = 0,A(0) = 1, 
n, if A(l) = 0, A(0) = 0, 


Gi = 


(9) 


and referred to the subgroups defined in ([9]) as always-takers (a), compliers (c), defiers (d) and 
never-takers (n). Let ir g = pr (G = g ) denote the probability of the stratum g £ {a,c,d,?i}, and 


9ki = pr {Y (1) = k, Y(0) = l \ G = g} 
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be the probability of potential outcome k under treatment and potential outcome l under control 
within stratum g. The J x J probability matrix {gki}o<k i<j —1 summarizes the joint distribution 
of the potential outcomes for stratum g. Define 

J -1 J-i 

9k+ = J2 gu '’ 9+i = ^gk’i (M = o, i,..., J — i); (io) 

v =o k '=o 

the vectors (<?o+> • • • ,9j- i,+) T and {g + o,... ,g+,j- i) T characterize the marginal distributions of the 
potential outcomes under treatment and control. By the law of total probability, 

Pkl = ^ ] 'Kgdkl ; Pk+ = ^ ] 7r gfffc+; P+i = ^ ] ^g9+l • (H) 

9 9 9 

We define the subgroup causal parameters within stratum g as 


prO^l) >^(0) I G = g} = EE 9kh Vg = pr{^i(l) > *i(0) | G = g} = EE 9kl■ 

k>l k>l 


Following 


Angrist et ah 


(11996I ). we invoke the following assumptions: (1) Complete Random¬ 


ization, i.e., Z1L{D(1), D(0), Y (1), E(0), X}-, (2) Strong Monotonicity, i.e., D*(0) = 0 for all i, or 
Monotonicity, i.e., Dj(l) > Dj(0) for all i\ (3) Exclusion Restriction, i.e., Dj( 1) = Di( 0) implies 
Yi( 1) = lj(0). Monotonicity rules out the defiers with G = d, and strong monotonicity further rules 
out the always-takers with G = a. Exclusion restriction implies that r n = l,7/ n = 0,r a = 1 and 
g a = 0. Therefore, we discuss only the causal effects for the compilers, i.e., r c and r/ c . 

4.2. Bounds on the Causal Effects for Compilers 

We focus only on the case under monotonicity, because it is more general than strong monotonicity. 
Under monotonicity and exclusion restriction, we can identify the probabilities of always-takers, 


compilers and never-takers, i.e., (Vp., 7r r , and the c 


ditional on G (lAngrist et al. 


1996; 


Cheng 


2009; 


Baker 


istributions of the potential outcomes con- 


20 11). i.e., the fffc+’s and g + i s. Below, we 


establish the relationships between the causal parameters r and r c , and between g and g c . 
Theorem 5. r c = t/tt c — (1 — 7r c ) /tt c and g c = g/i r c . 


Therefore, we can plug in the upper and lower bounds of r and g to obtain the bounds of r c and 
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77 c , using the relationships in Theorem [5j However, these bounds are not sharp, and the following 
bounds, implied by Theorems |T] and [ 2 l are narrower. 

Corollary 2. The sharp lower and upper bounds of r c are 

t c ,l = max (c+j + A c ,-), t c U = 1 + min A c 

and the sharp lower and upper bounds of r/ c are 

V c,L= max A c j , rj cU = l+ min (A cj -c j+ ). 

Similar to Section [3721 we can use covariates to sharpen the bounds of r c . Within each level of 
the pretreatment covariates X = x, we define the conditional probabilities that the treatment is 
beneficial for compilers as 


r c (£c) = pr{T(1) > Y(0) | G = c, X = x}, 


and obtain their conditional sharp upper and lower bounds t c ^{x) and r Cj { j(x). Because 

_ / t c (x) tt c (x) dF (x) 
f 7 r c (x) dF (x) 


the bounds for t c become 


r c,L = 


f t C)L (x) 7r c {x) dF (x) , _ f r c ,U (x) 7T C (x) dF (x) 


f 7T C (aj) dF (x) 


TT — 


■> 1 c,U 


f 7T C (x) dF (x) 


( 12 ) 


Similar to Theorem 01 the adjusted bounds are tighter, i.e., t C) l < r' L < t' cU < 


4.3. Using Noncompliance to Sharpen Bounds for the Whole Population 

Theorem [5] and Corollary 0 imply two new sets of bounds for r and rj, which are tighter than those 
in Theorems Q] and [2j 
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Corollary 3. We can bound r from below and above using 


— ^c^c,L T 1 TTc) 1~u — T^c^c,U T 1 TTcj 
and bound 77 from below and above using 


Vl — n c r]c,L, Vu — ^cVc,U- 

These new bounds above are narrower than those in Theorems Q] and El because they satisfy 
t~l < t'I , t v = t'/j, r] L = rj'[, and Tq v > 77 ^. 

There are two reasons that we can obtain tighter bounds. First, we use the partially observed 
variable G as a pretreatment variable. Second, the monotonicity and exclusion restriction assump¬ 
tions further restrict the probability structure of the potential outcomes. 

5. Statistical Inference of the Bounds 

In practice, we need to estimate the marginal probabilities of the potential outcomes and the 
bounds. To save space for the main text, we discuss only the bounds of r and r c . 

5.1. Point Estimation 

Completely Randomized Experiments To estimate the unadjusted bounds, we replace pk+ 
and p + i in Theorem |T| with their sample analogues. Moreover, to estimate the covariate adjusted 
bounds in ([ 8 ]), we invoke parametric models such as proportional odds models to estimate the 
marginal probabilities of the potential outcomes of unit i as pk+ (*i) and (*,.), and use them to 
estimate the sharp lower and upper bounds, tl (*i) and t'u (aij), for r ( Xi ) . Finally, the estimated 
adjusted bounds of r are 

N N 

t'l = N - 1 t l {■Xi ), t' L j = N - 1 Y tu (Xi ). 

i=l i= 1 
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Unconfounded Observational Studies If we have propensity score estimator e{xi) for unit i, 
then we can estimate the marginal probabilities by 


N 


Pk+ = N- 1 ^ Zi 


i —1 


l(y.° bs = k ) 

e(Xi) : 


N 


p +l =N~ l Y^-Zi. 


i= 1 


l(U- obs = l) 
l-e(X t ) ' 


and then estimate the bounds accordingly. 


Completely Randomized_Experiments With Noncompliance Without covariates, we use 


the EM algorithm (IDem pster et al 


19771 1 to estimate n c , c k + and c+i, and then estimate t 


re un¬ 


adjusted bounds in Corollary [2j For a more detailed description of the EM algorithm, see 


Baker 


(2011|). With covariates, we need to invoke parametric models for G (e.g., multinomial logistic model 


given X ) and the marginal probabilities of the potential outco mes, and use the EM algorithm to 


compute the maximum likelihood of the model parameters (cf. 


Zhang et al. 


2009; 


Frunrento et al. 


2012i ). After obtaining the sample analogues of t C) l{x),t C) u{x ) and n c (x), we can estimate the 
covariate adjusted bounds defined in (fl2l) using a plug-in approach. 


5.2. Confidence Intervals 

To quantify the uncertainty associated with the aforemen tio ned e stimators of the bounds, we can use 
the bootstrap method proposed by Horowitz and Manskil (120000 to obtain the confidence intervals 


(Cl) for the unadjusted and covariate adjusted bou nds. For computationa 


bootstrap methods, see ICheng and Small ( 20061 1 and 


Yang and Small ( 2010) ). 


details of some other 


Because the upper and lower bounds involve maximum and minimum of several terms, their 
asymptotic distributio ns are not normal, and th e constructi on of confidence intervals on the bo unds 


becomes challen ging (Hirano and P orter 


Chernozhukov et al. 


201211. Recently, 


Ro man o and Shaikh (12008. 


2010 ) and 


(|2013l l proposed some delicate methods to construct confidence intervals for 


part i ally ide ntified parameters. However, several researchers (e.g., 


2010 


Yang 


Cheng and Small 


2006; 


Fan and Park 


20141 1 eval uated the performance of the bootstrapped co nfidence intervals for partially 


identified parameters (jBera n 


1988 


1990; 


Horowitz and Manskil 


20001 ) via extensive simulations. Al¬ 


though the rigorous theoretical guarantee of the bootstrapped confidence intervals has not been fully 
established, they found the bootstrapped confidence intervals work fairly well in various settings, 
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and their performances are at least comparable to the more delicate methods mentioned before. 
Therefore, for simplicity in simulations and transparency in applications, we still use bootstrap to 
construct confidence intervals. We provide the code for implementation, and more sophisticated 
users can modify our code to include the more advanced methods. 

6. Simulation Studies 

6.1. Without Noncompliance 

To save space in the main text, we focus only on r and its bounds in Theorem [T] We choose the 
sample size to be 200, and consider four cases with different probability matrices P’s. Cases 1 and 
2 correspond to matrices P2 and P3 in ([ 6 ]), i.e., the independent and positively associated potential 
outcomes, which share the same marginal distribution but do not satisfy the stochastic dominance 
assumption. Cases 3 and 4 correspond to matrices P 5 and Pq in (J7]), i.e., the independent and 
positively associated potential outcomes, which share the same marginal distribution and satisfy 
the stochastic dominance assumption. Columns 2-4 of Table Q] summarize the true values of r, 77 
and tu, for all four cases. For Cases 1 and 3 with independent potential outcomes, 77 , < r < 77 / . 
For Cases 2 and 4 with positively associated potential outcomes, t = tjj. 

For each case, we independently draw 5000 treatment assignments from a balanced completely 
randomized experiment. For each observed dataset, we calculate point estimates of 77 , and 77 /, and 
construct a 95% confidence interval for the bounds ( 77 , 77 /), i.e., a confidence interval that contains 
both the lower bound 77 and the upper bound 77 / with probability 0.95. In columns 5-8 of Table 
[T] we report the biases and standard errors of the point estimators 77 and 77 /; in columns 9 and 
10 of Table [H we report the coverage rates of the intervals on the bounds ( 77 , 77 /) and the true 
parameter r. Table [T] shows that the point estimators have small biases and standard errors, and 
the confidence intervals achieve reasonable coverage rates on the bounds ( 77 , 77 /), although they 
over-cover the true parameter r. 

6.2. With Noncompliance 

To evaluate the finite sample performances of the estimators and the confidence intervals of the 
bounds, we conduct simulation studies under different model specifications. To save space, we 
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Table 1: Numerical examples without noncompliance. The first three columns contain the true 
values, the next four columns contain the biases and standard errors of the point estimators of the 
bounds, and the last two columns contain the coverage properties of the confidence intervals for 
the bounds and the true parameter. 


Case 

T 

tl 

ru 

biasi 

se L 

bias*/ 

seu 

coveragei 

coverage 2 

1 

0.640 

0.400 

0.800 

0.016 

0.037 

0.000 

0.045 

0.987 

1.000 

2 

0.800 

0.400 

0.800 

0.013 

0.043 

-0.001 

0.057 

0.957 

0.974 

3 

0.880 

0.600 

1.000 

0.026 

0.030 

0.000 

0.000 

0.967 

1.000 

4 

1.000 

0.600 

1.000 

0.025 

0.031 

0.000 

0.000 

0.960 

1.000 


focus only on the parameter r c , and consider six simulation cases. Cases 1-3 are indexed by the 
parameter (5 G {1,1/2, 0}, and Cases 4-6 are indexed by the parameter £ G {1,1/2,0}. We postpone 
the interpretations of /? and rj until afterwards. For each case, let the pretreatment covariates 
X = (1, Xi, X 2 ), where X\ ~ iV(0,1), and X 2 ~ Bern (1/2). For fixed X = x, we generate the 
variable G from a multiple logistic model: 


n g (x) = exp(t] g 


X 


)/ 


X! ex p faff' 

9' 


X 


(.9 = a,c, n ), 


where ri c = 0, r] a = (1/2,1,0) and r] n = (—1/2,1,0) . We generate the potential outcomes from 
proportional odds models. 


1. For always-takers, let l/( 1) = 1/(0), and their marginal distributions be 


logit \ ^2 ak + (*) f = logit { ^2 a+l (*) f = “ 2x1 ’ 

fc<i J l i<j 


where a a fl = —1/2 and a a ,\ = 1. 


2. For never-takers let 1/(1) = 1/(0), and their marginal distributions be 


logit 


^ n k+ (x) > = logit 


n + i (*) r = a 

l<3 


n,3i 


where a a fl = —3/2 and a a ,i = 0. 

3. For compilers let 1/(1) and 1/(0) be independent, and the values of the parameters be a Ct 0 = 
-1, a c ,i = 1/2, 7c ,0 = 1/2 and 7 C) i = 2. 
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(a) For Cases 1-3, let the marginal distributions be 


logit < 


'y ] c k+ i x )} — u c ,j 

k<j 


2/3xi, logit ^2 c +i (*) f = 7c, j + Pxy, 
l<3 


(b) For Cases 4-6, let the marginals distributions be 


logit { 'yT Ck + (*) f = a c,j ~ 2x i ~ , 

k<j 


logit 



= 7 c,j + Xl + £x 2 . 


For the above six cases, their true values of r c , unadjusted and adjusted bounds are in columns 
2-4 of each sub-table of Table (2j For Cases 1-3, the parameter /3 quantifies the association between 
the covariates and the potential outcomes. As /3 decreases, the covariate adjusted bounds become 
closer to the unadjusted bounds. For Cases 4-6, the parameter £ quantifies the association between 
the binary covariate X 2 and the potential outcomes of compilers. 

We conduct inference without the binary covariate X 2 . This does not affect Cases 1-3 because 
X 2 is irrelevant in the data generating process, but does affect Cases 4-6. We purposefully design 
the data generating process in this way, to examine the performance of our estimators under cor¬ 
rect and incorrect model specifications. For each case, we choose the sample size to be 1000, and 
independently draw 1000 treatment assignments from a balanced completely randomized experi¬ 
ment. For each observed dataset, we first estimate the bounds r c j J and t C) u, and construct a 95% 
confidence interval for (t Cj l, t Ci u); we then estimate the bounds t' cL and t' cU , and construct a 95% 
confidence interval for (t' cL ,t' cU ). 

We report the simulation results in Table (2j in which columns 4-7 of each sub-table include 
the biases of the point estimators, the average lengths and coverage rates of the 95% confidence 
intervals on the bounds. First, the point estimators of the bounds have small biases. Second, when 
the pretreatment covariates are associated with the potential outcomes, the confidence intervals 
of the bounds {t c ,l, t c,u) are longer than those of (t' c l , t' c y), on average. Third, the confidence 
intervals for the bounds (T Ct L,T Cj u) and (t' cL ,t' cU ) achieve reasonable coverage rates. Fourth, the 
performance of the bounds is robust to the missing of the binary covariate, or, equivalently, mis- 
specification of the outcome models. 
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Table 2: Numerical examples with noncompliance. In each sub-table, the first three columns contain 
the true values of the causal parameter r c and its lower and upper bounds, the next two columns 
contain the biases of the point estimators of the lower and upper bounds, and the last two columns 
contain the lengths and coverage rates of the 95% confidence intervals for the bounds. 


(a) Unadjusted Bounds 


Case 

Tc 

Tc,L 

Tc,U 

biasL 

biasj/ 

length 

coverage 

1 

0.686 

0.488 

0.970 

0.002 

-0.028 

0.658 

0.947 

2 

0.770 

0.553 

1.000 

0.005 

-0.005 

0.574 

0.973 

3 

0.856 

0.622 

1.000 

0.034 

-0.000 

0.485 

0.958 

4 

0.782 

0.590 

1.000 

0.000 

-0.002 

0.528 

0.976 

5 

0.738 

0.542 

1.000 

0.002 

-0.016 

0.588 

0.966 

6 

0.686 

0.488 

0.970 

0.002 

-0.028 

0.658 

0.947 


(b) Adjusted Bounds 


Case 

Tc 

T 'c,L 

T c,U 

biasL 

biasj/ 

length 

coverage 

1 

0.686 

0.503 

0.772 

0.008 

-0.006 

0.466 

0.968 

2 

0.770 

0.563 

0.935 

0.004 

-0.007 

0.530 

0.968 

3 

0.856 

0.622 

1.000 

0.021 

-0.001 

0.489 

0.959 

4 

0.782 

0.602 

0.846 

0.005 

0.012 

0.436 

0.960 

5 

0.738 

0.556 

0.817 

0.007 

-0.003 

0.447 

0.965 

6 

0.686 

0.503 

0.772 

0.008 

-0.006 

0.466 

0.968 
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7. Applications 


7.1. A Taste-Testing Experiment Without Noncompliance 


We use the taste-testing experiment data in 


Bradley et ah 


(119621) to demonstrate the estimation 


and inference of the proposed causal parameters. The outcome of interest Y is ordinal with five 
categories, from “terrible” with Y = 0 to “excellent” with Y = 4. We consider only three treat¬ 
ments C, D, E, and summarize the data and results in Table [3j The negative associated potential 
outcomes appear unlikely for this example, therefore we focus on the interpretations of the cases 
with independent and positive correlated potential outcomes, e.g., tj and tjj. First, treatment E 
stochastically dominates treatment C, and the confidence intervals for ( 77 , 7 / 7 ) and (rji,rju) are 
(0.913, 1.000) and (0.651, 1.000). The results suggest that treatment E is indeed better than treat¬ 
ment C, because both lower confidence limits are greater than 0.5. Second, although treatment E 
and treatment D do not stochastically dominate each other, the confidence intervals for ( 77 , tjj) and 
(77 /, rju) are (0.656, 0.982) and (0.519, 0.886), suggesting that treatment E is better than treatment 
D. Therefore the proposed causal parameters r and 77 are useful for decision making, especially 
when the stochastic dominance assumption does not hold. 

Table 3: Analysis of a Taste-Testing Experiment 
(a) Data from Bradley et al.1 ( 19621 ) 


Outcome Categories 


Treatment 0 1 2 3 4 row sum 

C 14 13 6 7 0 40 

D 11 15 3 5 8 42 

E 0 2 10 30 2 44 


(b) Results for r 



tl 

Tl 

TU 

Cl for ( t l ,tjj ) 

Cl for ( 77 , 777 ) 

E vs C 
E vs D 

0.779 

0.645 

0.945 

0.782 

1.000 

0.855 

(0.673, 1.000) 
(0.495, 1.000) 

(0.913, 1.000) 
(0.656, 0.982) 

(c) Results for 77 


VL 

m 

Vu 

Cl for (riL.riu) 

Cl for ( 777 , 7 / 77 ) 

E vs C 

E vs D 

0.630 

0.574 

0.777 

0.660 

0.870 

0.736 

(0.480 1.000) 
(0.423, 0.886) 

(0.651, 1.000) 
(0.519, 0.886) 
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7.2. A Job Training Program with Noncompliance 


In the mid-1990s, Mathematica Policy Research conducted an experiment that 


eligible applicants into the Job Corps program (jSchochet et al 


experiment 

that 

CO 

o 

o 

CM 

1 *1 

Lee 

2009) 


randomly enrolled 


20091 ). We r e-analyzed 


the dataset from 1995 with 13499 units. For detailed descriptions of the dataset, see 


(.20 091) and 


Frumento et al. 


Zhang et al. 


(2012|). In our analysis, Z = 1 if an applicant was enrolled in the 


program, and Z = 0 otherwise; D = 1 if an applicant actually participated in the program, and 
D = 0 otherwise. The strong monotonicity assumption holds by design. Using the hourly wage 
after 52 weeks of enrollment, we create a three-level ordinal outcome Y as follows: Y = 0 for zero 
wage because of unemployment, Y = 1 for low wage (no more than 4.25 U.S dollars, 150 % of the 
minimal wage at the time the data was collected), and Y = 2 for high wage (more than 4.25 U.S 
dollars). The covariates include gender, age, education, marital status, etc. 

Table 0] summarizes the results. For both causal parameters r c and r/ c , the confidence intervals 
for the lower and upper bounds become narrower when we take covariates into account. Similarly 
as the previous example, we focus on the interpretations of the cases with independent and positive 
correlated potential outcomes. The confidence intervals with or without covariates for 
suggest that the hourly wages of more than 70% of participants does not decrease because of the 
job training program. Additionally, the confidence intervals with or without covariates for ( f]i,rju ) 
suggest that the hourly wages of roughly 20%-30% of participants strictly increase because of the 
job training program. 


Table 4: Analysis of the Job Corps Program 


(a) Results for r 



t c ,l 

r c ,i 

%,u 

Cl for (■ t C! l,t Cj u ) 

Cl for {t c j ■ t c jj) 

w/o Covariates 
w/ Covariates 

0.561 

0.592 

0.708 

0.722 

0.913 

0.910 

(0.538, 0.937) 
(0.571, 0.931) 

(0.687, 0.934) 
(0.701, 0.931) 

(b) Results for r) 


Vc,L 

Vc,I 

Vc,U 

Cl for (rjc^L, rj c>u ) 

Cl for (ricj, rj CjU ) 

w/o Covariates 
w/ Covariates 

0.005 

0.006 

0.209 

0.193 

0.352 

0.319 

(0.000 0.361) 
(0.000, 0.329) 

(0.199, 0.362) 
(0.181, 0.330) 


As a final note, we use this example to illustrate Corollary [3l Without the noncompliance 
information, the estimators of the bounds of r are tl = 0.558 and tij = 0.937, with 95% confidence 
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interval (0.542, 0.953); the estimators of the bounds of rj are tjl = 0.004 and rju = 0.379, with 
95% confidence interval (0.000, 0.388). With the noncompliance information, the estimators of 
the bounds of r are t'I = 0.683 and Tjj = 0.937, with 95% confidence interval (0.667, 0.953); the 
estimator of the bounds of r] are = 0.004 and rf^ = 0.254, with 95% confidence interval (0.000, 
0.262). Therefore, the noncompliance information in return improves the inference of r and r/ for 
the whole population. 


8 . Concluding Remarks 

We proposed to use two causal parameters to evaluate treatment effect on ordinal outcomes, and 
derived the explicit forms of their sharp bounds by using only the marginal distributions of the 
potential outcomes. Although we advocate the use of parameters r and r? to measure treatment 


effects, we ac 

practice (e.g. 

cnowlec 

ge that some other causal parameters may also provide some information in 

Aexesti 

2010 ; 

Volfovskv et al. 

2015 

). For general parameters, although deriving the 


explicit forms of the bounds may be difficult, we may use numerical methods. For instance, we 
can use numerical linear programs to calculate the maximum and minimum values of the relative 
treatment effect a = r + rj — 1 under the constraints in (4). 


Appendix 


We first state a lemma extending a result in 


Strassenl ( 19651 ). This lemma plays a central role in 


our later proofs, and is also of independent interest. We provide the proof of the lemma in the 
Supplementary Material. In this Appendix, we present only the proof of Theorem [TJ and relegate 
the proofs of other theorems and corollaries to the Supplementary Material. 


Lemma 1. Assume that (xo, ■ ■ ■, x n -i) and (yo, ■ • ■, y n - 1 ) are nonnegative constants. 

(a) If Y^r=s x t — (Cr=.s 2 /r for all s = 0,..., n — 1, there exists an n x n lower triangular matrix 
A n = ( CLki)o<k,i<n-i with nonnegative elements such that 


n—1 n—1 

'^a k i'<x k , ^2 a k'i = yi (M = 0,... ,n - 1). (13) 

Z'=0 k'= 0 
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(b) If E ”" 1 x r < Er=s 2/r for all s = 0 ,..., n — 1 , there exists an n x n upper triangular matrix 
B n = ( bki)o<k,i<n-i with nonnegative elements such that 


n —1 n —1 

E Z b ki' =x k, 22 bk 'l - y l (M = 0 ,...,n- 1 ). (14) 

l '=0 k ’=o 

(c) If Er=0 X r < Er=o V r f° r all s = 0 ,. .., n — 1 , there exists an n x n lower triangular matrix 
C n = ( Pki)o<k,i<n-i with nonnegative elements such that 


n —1 n —1 

Y.PW = x k, 22 Pk ' l ~ yi (k,l = 0,... ,n - 1). (15) 

Z'=0 k'= 0 

( d ) If Er=0 X r > Er=o y r f° r all s = 0 ,..., n — 1 , there exists an n x n upper triangular matrix 
D n = (dki)o<k,l<n-i with nonnegative elements such that 


n —1 n —1 

22 dkV<x k , 22 d k 'i = Hi {k,l = 0,...,n-l). 

Z'=0 fc'=0 


(16) 


(e) If we further assume Er=o Vr = Er=o x ri the above inequalities in (fl3l) - (fTTTIi all reduce to 
equalities, i.e., the matrices A n , B n , C n and D n have (xq, ■ ■ ■ ,x n -\) and (yo, ■ ■ ■ ,y n ~i) as 
their row and column sums. 


Proof of Theorem 0 For all j = 0,1,... , J — 1, 


T = 


2222pm = 1 -2222 p u 


k>l 


k<l 


' J -1 


< 1 - = 1 ■ EE?*' - 

k<j l>j \h=0 l>j k>j l>j 

< 1 - (e E»' - E E^) = 1 ■ (e^ - E^+ 

y/c=0 l>j k>j 1=1 J y l>j k>j 
= l + A j, 


(17) 

(18) 
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and 


r = 


k>j l<j k>j 1=0 k>j l>j 

J -1 J -1 

(19) 

2 2 pki ~ 2 2 pki = Y^p k +~Yl p+i 

k>j 1=0 k=0 l>j k>j l>j 

p +j + Aj, 

(20) 


which implies that tl < r < iy. 

We now construct two probability matrices attaining the lower and upper bounds respectively, 
using Lemma CD 

We first construct a probability matrix attaining the upper bound tjj. Let 


ji = min 0 < / < J — 1 : A,-/ = 


mm 


be the minimum index j that attains the minimum value of Aj’s. To attain tjj- the equalities in 
(HD and (fTH|) must hold, i.e., 

5 ^ = ( 21 ) 

k<l k<j\ k>j i l>j i k>ji Z=1 


If ji = 0, min 0 <j<j_i Aj = A 0 = 0, implying that A j = J2k=j Pk+ ~ Y.i=j P+l > 0 for 
all j, i.e., the marginal probabilities satisfy the stochastic dominance assumption. According to 
Lemma [II cl 1 there exists a lower triangular probability matrix P with marginal probabilities p\ = 
0o+, ■ ■ ■ ,PJ- i,+) T and p 0 = (p +0 , • •. ,p+,j-i) T . Correspondingly, r = 1 + A 0 = 1. 

If j l > 0, the constraints in (121 [1 force some elements of the probability matrix to be zeros. To 
be more specific, the constraints in ( 1211 ) imply that the probability matrix has the following block 
structure: 



( 22 ) 


^ 0 P br ^ 
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where the j\ x j\ sub-matrix P t 1 on top left and the (J — j \) x (J — ji) sub-matrix Pb r on bottom 
right are both lower triangular, and the j\ x {J — j 1 ) sub-matrix P tr on top right has no restrictions. 
Because Aj x < A j for all j = 0,1,... , J — 1, we have 

ji-i ii-i 3 3 

X*>*+^ X*^ 0’ = 0 ,.,ji - l); Y Pk +-Y p+l (j = ji,..., J- l). 

k=j l=j k=j 1 l=j 1 

Given the above two sets of constraints on the marginal probabilities, we construct the probability 
matrix P in three steps. 

(1) We apply Lemma fTl a) I to (po+, • • • iPji- 1 ,+) and (p+o, • • • )P+ji-i), and obtain a lower trian¬ 
gular matrix P t \ = ( Pki)o< k l<ji-i w ibh nonnegative elements such that 

3 i-i 3 i-i 

X Pki' < Pk+, X pk ' 1 = p + i ^ > ji - !)■ 

l'=0 fc'=0 

(2) We apply Lemma fllcll to (pj 1+ ,... ,pj_i j+ ) and (p+jb, • • • ,p+,j_i), and obtain a lower trian¬ 
gular matrix P^ v = ( Pki)j 1 < k kj-i with nonnegative elements such that 

J- 1 J- 1 

XI PM' = Pk+ , X Pk ' 1 ~ P+l (M = jl, • • • , J ~ !)• 

l'=h fc'=ji 

(3) We construct P tr = (pw) 0 < fc <j 1 -i,i 1 <z<j-i by letting 

A- 1 \ / J- 1 \ 

- X Pkl ') - X Pk ' 1 1 - 0 = °> • • ■ -!; 1 = h, ■ ■ ■, J - !)■ 

Z'=0 / \ k'=ji j 

The constructed probability matrix P has marginal probabilities p\ = (po+,... ,pj-i,+) T and 
Po = (p + o, • • • jP+’j-X . What is more, by (|22l) the r of P is the sum of all the elements in P t i 
and Pb r , which we construct in the above (1) and (2). Therefore, we have 

h-i J-l 

r = X p + 1 ' + X Pfc, + = 1 + A ii> 

0 k'=ji 

which implies that the probability matrix P attains tjj- 
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We then construct a probability matrix attaining the lower bound in tl- Let 


h 


min 


| f '■ P+j' + A j' = 


max (p+j + Ad 
o<j<J-i J J 


be the minimum index j that attains the maximum value of (p+j +Aj)’s. To attain tl, the equalities 
in (fl9l) and (l20l) must hold, i.e., 


k>l 


J -1 

Pkh EE Pki = EE Pkl- 

k>j2 l<]2 k>j 2 l>]2 k =0 l>j 2 



(23) 


If 32 = 0, from (|23D we know that the elements in the lower triangular part but not in the first 
column of the probability matrix P are all zeros, i.e., 


P = 


( 

P 

^ PJ- 1,0 



(24) 


where p = (po,Oi ■ ■ ■ ,Pj- 2 ,o) T j an( f the ( J — 1) x (J — 1) sub-matrix P tv on top right is upper 
triangular. Because p+ o + Aq > p+j + Aj for all j, we have 


3 i 

^2pk+>^2p+,i+i (j = o,..., j - 2 ). 

k =0 1=0 

Applying Lemma Ilf dll to (po+, ■ ■ ■ ,PJ- 2 ,+ ) and (p+\, ■ ■ ■ ,p+,j- 1 ), we obtain an upper triangular 
matrix P tr = ( Pki)o<k<j -2 kkj -i wifL nonnegative elements such that 

J -1 J—2 

'£pki><Pk+, J2pk'l=p + l (k = o,..., J - 2; l = 1,..., J- 1). 

l '=1 k '=0 

To complete the construction, let pj_i,o = Pj- i,+, and 

J -1 

PM = Pfc+ - ^2 Pkl' >0 (k = 0,..., J - 2). 

Z'=l 

The constructed probability matrix P has marginal probabilities p\ = (po+, ■ ■ ■ ,pj-i t +) T and 
Po = (p+ o,... ,p+,j- i) T • Moreover, by ([Mil the r of P is the sum of all the elements in the first 
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column. Therefore r = p + q = p + o + Ao, which implies that P attains tl- 

If J 2 = J ~ 1) the proof is similar to the above case with j 2 = 0. If 0 < j 2 < J — 1, because the 
first equality in (1231) is equivalent to 

Pki + EE Pki = EE Pki , 

k<j2 l<k k>j 2 l<k k>j 2 

the probability matrix P must satisfy the following constraints: 

(Cl) For all fc = 0,..., j 2 — 1, = 0 for all Z = 0,..., k. 

(C2) For all k = j 2 + 1,..., J — 1, pki = 0 for all l = j 2 + 1,..., k. 

Similarly, because the second equality in (|23l) is equivalent to 



k>j 2 l>j 2 k>j2 l>j2 k<j2 l>j2 

the probability matrix P must further satisfy the following constraint: 
(C3) pki = 0, for all k = 0,..., j 2 — 1 and l = j 2 + 1,..., J - 1. 


The constraints in (Cl), (C2) and|(C3)] imply that P must have the following block structure: 


P = 



\ 


/ 


(25) 


where the j2 x j2 sub-matrix P t \ and the ( J — j\ — 1 ) x ( J — j\ — 1 ) sub-matrix P^ r are both upper 
triangular, and the (J — j'2) x (j'2 + 1 ) sub-matrix on bottom left has no restrictions. 

Because p+j 2 + A j 2 > p + j + A j for all j, we have 

j2~ 1 J2 — 1 5 S 

^ p+, pi (i = 0,.. . ,j2-1); ^^ p+, ;+ i (j = j 2 ,...,J-2). 

k =j l =j k =h 1=32 

Given the above two sets of constraints for the marginal probabilities, we construct the probability 
matrix P in three steps. 
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(1) We apply Lemma 1X11)11 to (po+) • • • iPj 2 - 1,+) and (p+ 1 , • • • >P+,j 2 ) > and obtain an upper trian¬ 
gular matrix P t \ = (pkl)o<k<j 2 -i \<i<j 2 with nonnegative elements such that 

32 j 2 —i 

^pw =Pk+, Pk'i<p+i {k = o,... ,j 2 - i; l = i,-■ ■ ,j 2 )- 

('=1 fc'=0 

(2) We apply Lemma \m to (p j2+ ,... ,PJ- 2 ,+) and (p+,j 2 + 1 , • • ■ ,P+,J- 1), and obtain an upper 
triangular matrix F\, r = {Pkl)j 2 <k<j -2 j 2 +i<l<J-i with nonnegative elements such that 

J-i J—2 

Pkl'<Pk+, J2 Pk 'l = P+l (k = -2; l = j 2 + l,...,J-1). 

I'=j 2+1 k'=j 2 

(3) We construct P b \ = (Pkl) j 2 < k <j-i,o<i<j 2 b Y letting 


/ J-l \ 

( j2_1 \ 



1 Pk+- ^2 pw I 

( P+l ~ ^Pk'l 1 > 0 ( k=j 2 

o' 

II 

T—1 

1 

• ,32) 

V V=32 + \ ) 

V k'= 0 J 




The constructed probability matrix P has marginal probabilities p\ = (po+, ■ ■ . ,pj_i i +) T and 
Po = (p+o, ■ ■ ■ ,p+,j- i) T • Moreover, by ({25]) the corresponding r is the sum of all the elements in 
Pu, which we construct in the above (3). Therefore, 

j 2 i J-i J-i J -1 

r = 1 - pk '+ - p+ 1 ’ = pk '+ - p+v=p+ » + a j ' 2 ’ 

k ’=0 i'=j2 + l k'=j 2 i'=j2+l 

which implies that P attains tl- □ 

References 

Agresti, A. (2010). Analysis of Ordinal Categorical Data, 2nd ed. Hoboken, New Jersey: John 
Wiley and Sons. 

Angrist, J. D., Imbens, G. W., and Rubin, D. B. (1996). Identification of causal effects using 
instrumental variables (with discussion). J. Am. Statist. Assoc., 91:444-455. 


28 





Baker, S. G. (2011). Estimation and inference for the causal effect of receiving treatment on a 
multinomial outcome: An alternative approach. Biometrics, 67:319-323. 

Beran, R. (1988). Balanced simultaneous confidence sets. J. Am. Statist. Assoc., 83:679-697. 

Beran, R. (1990). Refining bootstrap simultaneous confidence sets. J. Am. Statist. Assoc., 85:517- 
426. 

Bradley, R. A., Katti, S. K., and Coons, I. J. (1962). Optimal scaling for ordered categories. 
Psychometrika, 27:355-374. 

Cheng, J. (2009). Estimation and inference for the causal effect of receiving treatment on a multi¬ 
nomial outcome. Biometrics, 65:96-103. 

Cheng, J. and Small, D. S. (2006). Bounds on causal effects in three-arm trials with non-compliance. 
J. R. Statist. Soc. B, 68:815-836. 

Chernozhukov, V., Lee, S., and Rosen, A. (2013). Intersection bounds: estimation and inference. 
Econometrica, 81:667-737. 

Demidenko, E. (2016). The p-value you can’t buy. Am. Statistian, in press. 

Dempster, A. P., Laird, N., and Rubin, D. B. (1977). Maximum likelihood estimation from incom¬ 
plete data using the EM algorithm (with discussion). J. R. Statist. Soc. B, 39:1-38. 

Diaz, I., Colantuoni, E., and Rosenblum, M. (2016). Enhanced precision in the analysis of random¬ 
ized trials with ordinal outcomes. Biometrics, 72:422-431. 

Ding, P. and Dasgupta, T. (2016). A potential tale of two by two tables from completely randomized 
experiments. J. Am. Statist. Assoc., 111:157-168. 

Djebbari, H. and Smith, J. A. (2008). Heterogeneous impacts in PROGRESA. J. Econometrics, 
145:64-80. 

Fan, Y. and Park, S. S. (2010). Sharp bounds on the distribution of treatment effects and their 
statistical inference. Economet. Theor., 26:931-951. 


29 


Fan, Y., Sherman, R., and Shum, M. (2014). Identifying treatment effects under data combination. 
Econometrica, 82:811-822. 

Frank, M. J., Nelsen, R. B., and Schweizer, B. (1987). Best-possible bounds for the distribution of 
a sum—a problem of Kolmogorov. Probab. Theor. Rel. , 74:199-211. 

Freedman, D. A. (2008). Randomization does not justify logistic regression. Statis. Sci., 23:237-249. 

Frumento, P., Mealli, F., Pacini, B., and Rubin, D. B. (2012). Evaluating the effect of training 
on wages in the presence of noncompliance, nonemployment, and missing outcome data. J. Am. 
Statist. Assoc., 107:450-466. 

Gadbury, G. L. and Iyer, H. K. (2000). Unit-treatment interaction and its practical consequences. 
Biometrics, 56:882-885. 

Grilli, L. and Mealli, F. (2008). Nonparametric bounds on the causal effect of university studies on 
job opportunities using principal stratification. J. Educ. Behav. Stat., 33:111 130. 

Heckman, J. J., Smith, J., and Clements, N. (1997). Making the most out of programme evaluations 
and social experiments: Accounting for heterogeneity in programme impacts. Rev. Econ. Stud., 
64:487-535. 

Hirano, K., Imbens, G. W., and Ridder, G. (2003). Efficient estimation of average treatment effects 
using the estimated propensity score. Econometrica, 71:1161-1189. 

Hirano, K. and Porter, J. (2012). Impossibility results for nondifferentiable functionals. Economet¬ 
rica, 80:1769-1790. 

Horowitz, J. L. and Manski, C. F. (2000). Nonparametric analysis of randomized experiments with 
missing covariate and outcome data. J. Am. Statist. Assoc., 95:77-84. 

Huang, E., Fang, E., Hanley, D., and Rosenblum, M. (2015). Inequality in treatment benefits: 
Can we determine if a new treatment benefits the many or the few. Working Paper 274, Johns 
Hopkins University, Department of Biostatistics. 

Ju, C. and Geng, Z. (2010). Criteria for surrogate end points based on causal distributions. J. R. 
Statist. Soc. B, 72:129-142. 


30 


Lee, D. S. (2009). Training, wages, and sample selection: Estimating sharp bounds on treatment 
effects. Rev. Econ. Stud., 76:1071-1102. 

Long, D. M. and Hudgens, M. G. (2013). Sharpening bounds on principal effects with covariates. 
Biometrics, 69:812-819. 

Makarov, G. D. (1982). Estimates for the distribution function of a sum of two random variables 
when the marginal distributions are fixed. Theory Probab. Appl., 26:803-806. 

Mealli, F. and Pacini, B. (2013). Using secondary outcomes to sharpen inference in randomized 
experiments with noncompliance. J. Am. Statist. Assoc., 108:1120-1131. 

Nelsen, R. B. (2006). An Introduction to Copulas, Second Edition. New York: Springer. 

Newcombe, R. G. (2006a). Confidence intervals for an effect size measure based on the mann— 
whitney statistic, part 1: general issues and tail-area-based methods. St at. Med., 25:543-557. 

Newcombe, R. G. (2006b). Confidence intervals for an effect size measure based on the mann— 
whitney statistic, part 2: asymptotic methods and evaluation. Stat. Med., 25:559-573. 

Neyman, J. (1923). On the application of probability theory to agricultural experiments, essay on 
principles. Section 9. Statist. Sci., 5:465-472. 

Pearl, J. (2009). Causality: Models, Reasoning and Inference, Second Edition. Cambridge Univer¬ 
sity Press. 

Richardson, A., Hudgens, M. G., Gilbert, P. B., and Fine, J. P. (2014). Nonparametric bounds and 
sensitivity analysis of treatment effects. Stat. Sci., 29:596-618. 

Romano, J. P. and Shaikh, A. M. (2008). Inference for identifiable parameters in partially identified 
econometric models. J. Stat. Plan. Inference, 138:2786-2807. 

Romano, J. P. and Shaikh, A. M. (2010). Inference for the identified set in partially identified 
econometric models. Econometrica, 78:169-211. 

Rosenbaum, P. R. (2001). Effects attributable to treatment: inference in experiments and obser¬ 
vational studies within a discrete pivot. Biometrika, 88:219-231. 


31 


Rosenbaum, P. R. and Rubin, D. B. (1983). The central role of the propensity score in observational 
studies for causal effects. Biometrika, 70:41-55. 

Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized 
studies. J. Educ. Psychol, 66:688-701. 

Rubin, D. B. (1978). Bayesian inference for causal effects: the role of randomization. Ann. Stat., 
6:34-58. 

Rubin, D. B. (1980). Comment on “Randomization analysis of experimental data: the Fisher 
randomization test” by D. Basu. J. Am. Statist. Assoc., 75:591-593. 

Riischendorf, L. (1982). Random variables with maximum sums. Adv. in Appl. Probab., 14:623-632. 

Scharfstein, D. O., Manski, C. F., and Anthony, J. C. (2004). On the construction of bounds in 
prospective studies with missing ordinal outcomes: application to the good behavior game trial. 
Biometrics, 60:154-164. 

Schochet, P. Z., Cao, J. B. R., Glazerman, S., Grady, A., Gritz, M., McConnell, S., Johnson, T., 
and Burghardt, J. (2003). National job corps study: Data documentation and public use hies, 
Volume I. Documentation, Washington, DC, Mathematica Policy Research, Inc. 

Strassen, V. (1965). The existence of probability measures with given marginals. Ann. Math. Stat., 
36:423-439. 

Volfovsky, A., Airoldi, E. M., and Rubin, D. B. (2015). Causal inference for ordinal outcomes. 
arXiv, 1501.01234. 

Yang, F. (2014). Causal inference methods for addressing censoring by death and unmeasured 
confounding using instrumental variables. Ph.D. Thesis, University of Pennsylvania. 

Yang, F. and Small, D. S. (2016). Using post-quality of life measurement information in censoring 
by death problems. J. R. Statist. Soc. B, 78:299-318. 

Zhang, J. L., Rubin, D. B., and Mealli, F. (2009). Likelihood-based analysis of causal effects of 
job-training programs using principal stratification. J. Am. Statist. Assoc., 104:166-176. 

Zhou, W. (2008). Statistical inference for P(X < Y ). Stat. Med., 27:257-279. 


32 


Supplementary Material 

The Supplementary Material consists of two parts. In Part [A] we prove Lemma 1 introduced in 
the Appendix, and provide the proofs of all the theorems and corollaries in the main text, except 
for Theorem 1. In Part [B] we present the sufficient and necessary conditions for the bounds in 
Theorem 1 to be the same. 


A. Proof of Lemma, Theorems and Corollaries 
A.l. Proof of Lemma 1 


Proof of Lemma 1(a). We prove by induction. When n = 1, we let A\ = yo > 0, and Lemma 1(a) 
holds because yo < xq- When n > 2, suppose Lemma 1(a) holds for n — 1. In particular, for any 
(xi,..., x n -i) and (yi,... , y n -i) such that Ylr=s x r — J2r=s Vr f° r all s = 1 ,..., n — 1 , there exists 
a lower triangular matrix A „_\ = (aki)i<k,i<n-i with nonnegative elements such that 

71—1 71—1 

^2a k r<x k , ^2 a k 'i = yi {k,l = 1,. .. ,n - 1). (A.l) 

v=\ k '=l 


To prove that Lemma 1(a) holds for n, we let 


A 


n 


( 

aoo 


0 T 


^ a A n _i 


where aoo and a = (aio, ■ ■ ■ are defined for two separate cases below. 


(1) yo < xo- We let aoo = 2/o, and a k o = 0 for all k = 1,... ,n — 1. Clearly, A n has nonnegative 
elements, and satisfies the row and column sum conditions in Lemma 1(a) holds; 

(2) yo > xo- We let aoo = ^o, and 

sr^n—1 

a k o = (yo-aoo) - ^7 - — ° U ' -r>0 (k = 1,..., n - 1). (A. 2 ) 

k '=1 — 2-jI '=1 a k'l') 

This construction guarantees that the column sums of A n are yf s. Furthermore, because 
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A n _i satisfies (1 A. 1 1) . we have 


v7 

Z. W 

k '=1 V 


72 — 1 


Y ak 

i '=i 


72—1 72 — 1 72 — 1 72—1 72 — 1 72—1 

y Xk> — y y = y av - y y aw 

k'= 1 fc'=H'=l fc'=l Z'=ifc'=i 

72—1 72 — 1 

y* fc ,-y Vk' >yo-x 0 = yo- a 00 > o. (A. 3 ) 

fc'=l fc'=i 


Formulas (IA.2I) and (IA.3I) imply that ak o < Xk — X)”=i and therefore ^f'=o a fc«' 7 for 
k = 1,..., n — 1. 


Therefore Lemma 1(a) holds for n, and the proof is complete. □ 

Proof of Lemma 1(b). By applying Lemma 1(a) to (yo, • • •, y n -i) and ( xq ,..., x n _i), we obtain a 
lower triangular matrix B n = ( bki)o<k,l<n-i with nonnegative elements such that 

72—1 72—1 

y b k 'l = x k, ^ 2 bki'<yk (M = 0 ,... ,n - 1 ). 

k’=0 l'=0 

--T 

Let B n = , and the proof is complete. □ 

Proof of Lemma 1(c). By applying Lemma 1(a) to (y n -i, ■ • ■, yo ) an d (x n -i, • • •, £o), we obtain a 
lower triangular matrix C n = ( Pkl)o<k,l<n-i with nonnegative elements such that 

72—1 72—1 

^ ^ C-k'l ^ ^ C kV — Vn—k—l (^j i — 0, . . . , 77- 1). 

fc'=0 Z'=0 

Let C n = (c n -i-.i !n -k-i) 0 < k ,i<n-i > and the P roof is complete. □ 

Proof of Lemma 1(d). By applying Lemma 1(c) to (yo,..., y n -i) and (xq, ■ ■ ■, x n -\) , we obtain a 
lower triangular matrix D n = (dki)o<k,l<n-i with nonnegative elements such that 

k 72—1 

y dki'=yk, ^2dk'i<Xk (k,l = 0 ,... ,n - 1 ). 

l’=0 k’=l 

-T 

Let D n = D n , and the proof is complete. □ 

Proof of Lemma 1(e). In addition to the proof of Lemma 1(a), we further need to show that if 
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SP=o Vr = Sr=o x ri the row sums of the constructed matrix A n are x^s. In the induction of the 
proof of Lemma 1(a), if we have constructed matrix A n _ i, the case with y q < xq would not happen. 
We consider only the case with r/o > xq. Because the lower triangular matrix A n _ \ has the column 
sums yi s, and Y^=l Vr = X^=o x r, we have 

71—1 / n— 1 \ 71—1 n—1 

Y ( x k' - Y ak ' v ) = Y Xk ' ~ Y yk ' = y ° - x ° = y ° ~ ° o ° > °- 

k'= 1 V l'= 1 / k'= 1 k'= 1 

The above formula, coupled with the construction of the first column of A n in (IA.2I) . gives o = 

Xk — a kV and thus o a kV = for all k. □ 

A.2. Proofs of Other Theorems and Corollaries 

Proof of Theorem 2. Because rj = 1 — pr{T)(0) > T)(l)} , its lower bound is one minus the up¬ 
per bound of pr{Tj(0) > Yj(l)} . By switching the treatment and control labels, we can bound 
pr{Yj( 0 ) > Yj(l)} from the above by 

pr{Tj(0) > Yi{ 1)} < 1 - max A j, 

which implies that rjr = maxo<j<j_i A j. 

Similarly, the upper bound of rj equals one minus the lower bound of pr {1^(0) > Yj(l)} . By 
switching the treatment and control labels, we can bound pr{l^( 0 ) > 1 ^( 1 )} from below by 

Pr {Ti(0) > Ti(l)} > o< max_ | (p j+ - A j ), 


which implies that rju = 1 + mino<j<j_i (A j — pj + ). □ 

Proof of Theorem 3. With independent potential outcomes, the probability matrix P has elements 
Pki = Pk+P+i for k and l. We obtain tj and rji by their definitions. Obviously, they are between 
their lower and upper bounds, i.e., tl < rj < tjj and r)L A Vi < Vu- □ 

Proof of Theorem f. The proof follows Lee (2009). Because any value of r within the covariate 
adjusted bounds \t' l , rf] must be compatible with the distributions of (T(l), X} and {T(0), A} , 
it must also be compatible with the distributions of Y( 1) and T(0) by discarding A. Therefore, any 
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value of r within the adjusted bounds must also be within the unadjusted bounds [tl,tu]- 

Consequently, the adjusted bounds are tighter, i.e., C Similar arguments apply to 

the covariate adjusted bounds and the unadjusted bounds for r c . □ 

Proof of Theorem 5. Under monotonicity, by the law of total probability, we have 


— TTc ~b TUa ~b '^n'^n- 


Under exclusion restriction, we have r a = 1 and r n = 1, yielding 


T — 7T C T C “b 1 TTc? 


which implies that 


T c = r/7T c - (1 - TT C ) /TTc- 


Analogously, we have tj = TT c rj c , which implies that r/ c = t)/tt c . □ 

Proof of Corollary 1. By Theorem 1, r = 1 if and only if mino<j<j_i A j = 0. Because Ao = 0, this 

is equivalent to A j > 0 for all j, i.e., the stochastic dominance assumption holds. □ 

Proof of Corollary 2. The proof follows directly from Theorems 1 and 2. □ 

Proof of Corollary 3. The closed-form expressions for t'J l , t'Jjj, rf cL and rj'f j follow directly from 

Theorem 5 and Corollary 2. Furthermore, under the monotonicity and exclusion restriction as¬ 
sumptions, we have 

Aj = 7 t c A cj (j = 0,..., J - 1). 

Therefore, for the upper bound of r, we have 


Tu = 1 — n c + 7T C (1 + nrin A c j) = t{j, 


and for the lower bound, we have 


TL < ma x(p+j - 1 + 7T C + 7T c A c j) = 1-7r c + 7T c max(c + j + A CJ ) = r'f. 
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The first step holds because under the strong monotonicity assumption n+jn n < ir n , and under 
the monotonicity assumption a + jir a + n + jir n < 7r a + 7r n . Similar arguments apply to the bounds of 

T], □ 

B. Condition for Point Identification of r and rj 

Under certain condition, the lower and upper bounds of r (or rj) in Theorem 1 will be the same, 
resulting in identification of the corresponding causal parameters. We formally state the sufficient 
and necessary conditions for this to happen in the following theorem. 

Theorem B.l. Let IK = {k : p k + > 0} and L = {/ : p + i > 0} . The lower and upper bounds of r 
are the same, if and only if there does not exist k\, &2 E IK and l\, I 2 E L such that 

k‘i >h> k\ > l\ or I2 > k-2 > h > k\. (B-l) 


The lower and upper bounds of p are the same, if and only if there does not exist k\, k 2 G IC and 
l \, I 2 G L such that 

h > k 2 > h > k\ or k 2 >h>k\>l\. (E>-2) 


Proof. Similar to the proof of Theorem 2, because p = 1 — pr {Ti(0) > T((l)} , (IB.ll) immediately 
implies (1B.2I) . Therefore, we need only to prove that (IB.ll) is the sufficient and necessary condition 
that the lower and upper bounds of r are the same, i.e., tl = tjj. 

First we prove the necessity of the condition. Assume that it does not hold, i.e., there does 
exist /ci ,&2 E IK and I 1 J 2 E L such that (IB.ll) holds. In this case we construct two probability 
matrices with the same marginal probabilities but different values of r. The first probability matrix 
is P = {pk+p+i)o<k,i<j-i ■ For the second probability matrix, let £ = min (p ku+ p +thl p k2 , + p +ih ), 
which is a positive constant. We then apply the following matrix operation to the 2x2 sub-matrix 
of the first probability matrix: 


( 

Pkth 


\Pk 2 h 


Pkp 2 



£ 


Pk 2 h 


yPk 2 h + £ 


Pkp 2 + £ 

Pk 2 l 2 ~ £ 


The above operation preserves the marginal probabilities, and the difference of t between the first 
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and second probability matrices is £, if A 2 > h > k\ > Zi, and — £, if I 2 > A 2 > Zi > Ai. 

Second, we prove the sufficiency of the condition. If |K| = 1 or |L| = 1, the probability matrix 
degenerates and consequently we have tl = r C: t/. If |K| > 2 and |L| > 2, let A:* = min^gK A 
and A* = max^^A be the minimal and maximal indices of nonzero Pfc+’s, and Z* = miri/gL l and 
l* = max; e L Z the minimal and maximal indices of nonzero p + i s. A useful fact that we repeatedly 
use is that if pk+ = 0, then p^i = 0 for all Z. Similarly, if p + i = 0, then p ^ = 0 for all k. 

Because A*, A* and l*, l* cannot satisfy (IB.II) . we discuss the two following cases based on the 
relative locations of the two intervals [A*, A*] and [l*, l*] : 

1. “Non-overlapping,” i.e., A* > l* or k* < l* : 

(a) If A’* > l*. we prove that pki = 0 for all A < Z. Assume the claim does not hold, then 
there exists A' < V such that pk'i' > 0, then pk'+ > 0 and p + i> > 0. This implies that 
A* < k' < l' < l*, contradicting the initial assumption. Therefore, tl = iy = 1. 

(b) If A* < Z*, similarly p^ = 0 for all A > Z, implying that tl = tjj = 0. 

2. “Inclusive,” i.e., I* > k* > A* > Z* or k* > l* > Z* > A* : 

(a) If r > k* > A* > Z*, and furthermore if there exists V G L such that A* < l f < A*, 
then Z' / Z* and V ^ l*. Moreover, A*, A* and Z',Z* satisfy (jB.ll) . contradicting the initial 
assumption. Therefore for all l G L, l < A* or Z > A*. Consequently, 

t = ^pfc;l(A G IK, Z € L) = ^pfc;l(A G IK, Z G L, Z < A*) 

k>l 

= ^Pkil{k £K,l eL,,l < k*) = ^2 P+l 

l<k t ,lG L 

is identifiable, which implies that tl = tij. 

(b) If A* > Z* > Z* > A*, similarly as above for all A G IK, A < Z* or A > Z*. Consequently, 

t = G 1K,Z G L) = y^p M l(A G IK, Z G L,A > l*) = ^ p fc+ 

is identihable, which implies that tl = tjj- 
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□ 


39 


